Anthropic
$21
Input tokens/M
$105
Output tokens/M
200
Context Length
Alibaba
-
$6
$6.4
32
Huawei
4
Deepseek
$1
8
$2
Google
$1.05
$4.2
1k
01-ai
Baidu
ezelikman
Based on the Mistral-7b model, it employs the Quiet-STaR method for continuous pretraining, generating 8 reasoning tokens before each output token to enhance reasoning capabilities.
vasugoel
K-12BERT is a BERT model obtained through continuous pretraining on K-12 basic education data, optimized specifically for educational scenarios